skip to main content


Search for: All records

Creators/Authors contains: "Bickel, Peter J."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings.

     
    more » « less
  2. null (Ed.)
  3. null (Ed.)
  4. There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the conditional average treatment effect (CATE) function. Metaalgorithms build on base algorithms—such as random forests (RFs), Bayesian additive regression trees (BARTs), or neural networks—to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a metaalgorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz-continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the metalearners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our X-learner can be used to target treatment regimes and to shed light on underlying mechanisms. A software package is provided that implements our methods.

     
    more » « less
  5. Summary

    Community detection in networks is a key exploratory tool with applications in a diverse set of areas, ranging from finding communities in social and biological networks to identifying link farms in the World Wide Web. The problem of finding communities or clusters in a network has received much attention from statistics, physics and computer science. However, most clustering algorithms assume knowledge of the number of clusters k. We propose to determine k automatically in a graph generated from a stochastic block model by using a hypothesis test of independent interest. Our main contribution is twofold; first, we theoretically establish the limiting distribution of the principal eigenvalue of the suitably centred and scaled adjacency matrix and use that distribution for our test of the hypothesis that a random graph is of Erdős–Rényi (noise) type. Secondly, we use this test to design a recursive bipartitioning algorithm, which naturally uncovers nested community structure. Using simulations and quantifiable classification tasks on real world networks with ground truth, we show that our algorithm outperforms state of the art methods.

     
    more » « less